Genome-resolved carbon processing potential of tropical peat microbiomes from an oil palm plantation

Tropical peatlands in South-East Asia are some of the most carbon-dense ecosystems in the world. Extensive repurposing of such peatlands for forestry and agriculture has resulted in substantial microbially-driven carbon emissions. However, we lack an understanding of the microorganisms and their metabolic pathways involved in carbon turnover. Here, we address this gap by reconstructing 764 sub-species-level genomes from peat microbiomes sampled from an oil palm plantation located on a peatland in Indonesia. The 764 genomes cluster into 333 microbial species (245 bacterial and 88 archaeal), of which, 47 are near-complete (completeness ≥90%, redundancy ≤5%, number of unique tRNAs ≥18) and 170 are substantially complete (completeness ≥70%, redundancy ≤10%). The capacity to respire amino acids, fatty acids, and polysaccharides was widespread in both bacterial and archaeal genomes. In contrast, the ability to sequester carbon was detected only in a few bacterial genomes. We expect our collection of reference genomes to help fill some of the existing knowledge gaps about microbial diversity and carbon metabolism in tropical peatlands.

Here, we deeply sequenced peat metagenomes proximal to and distant from oil palm trees in an oil palm plantation. Oil palm plantations on peatlands are of particular interest as they are hotspots of microbially-driven carbon emissions. Using assembly-based approaches, we reconstructed 764 sub-species level (99% gANI) metagenome-assembled genomes (MAGs) with a completeness ≥50% and redundancy ≤10% which cluster into 333 species-level (95% gANI) MAGs (245 bacterial and 88 archaeal) (Fig. 1). Of these, 38 bacterial and 13 archaeal genomes are near-complete (completeness ≥90%, redundancy ≤5%, number of unique tRNAs ≥18), while an additional 207 bacterial and 91 archaeal genomes are substantially complete (completeness ≥70%, redundancy ≤10%). The MAGs have a median size of 3.23 Mbp (range: 0.43-10.91 Mbp), median N50 of 6.34 kbp (range: 3.3-104.84 kbp) and encode a total of 2,530,130 protein-coding genes. The sub-species-level collection spans 14 different bacterial and archaeal phyla with a majority (53.1%) belonging to the phyla Acidobacteriota and Thermoplasmatota, both of which occur widely in peatlands and in acidic soils 14,15 . Within these phyla, the MAGs provide maximum phylogenetic gain for the orders UBA7540 (13 genomes; phylogenetic gain: 33%) and UBA184 (56 genomes; phylogenetic gain: 76.72%). To our knowledge, this catalogue represents the largest collection of microbial genomes from a tropical peat ecosystem.
Carbon-processing potential of MAGs was determined using a comprehensive marker-gene-based approach which integrates gene functional annotations from multiple databases such as KEGG 16 , dbCAN 17 , PFAM 18 , and TIGRFAM 19 . The ability to respire a broad spectrum of carbon substrates such as amino acids, polysaccharides, and fatty acids was widespread across both bacterial and archaeal species but the capacity to fix carbon was detected only in a few genomes (Fig. 2). Fermentative pathways which produce alcohols and organic acids such as ethanol and acetate, as well as hydrogen and carbon dioxide were also prevalent. None of the archaeal genomes encode for pathways to convert fermentative end-products into methane, however, the capacity to oxidise methane was detected in a small fraction of MAGs (34 MAGs; 10.2%) from the phyla Acidobacteriota, www.nature.com/scientificdata www.nature.com/scientificdata/ Actinobacteriota, Protebacteria, Desulfobacterota_B, Thermoplasmata, Thermoproteota, and Micrarchaeota. In contrast, the capacity to oxidise non-methane trace gases such as methanol (93 MAGs; 27.9%), ethanol (69 MAGs; 20.7%), hydrogen (103 MAGs; 30.9%), and carbon monoxide (177 MAGs; 53.2%) was detected in several MAGs. Interestingly, 38 of the 93 MAGs, capable of oxidising methanol belong to the phylum Acidobacteriota, members of which are not typically linked to methanol consumption.
Overall, we expect our genomes database and metagenomes to be widely useful as a reference for metatranscriptomic experiments, comparative studies, and genome-guided isolation efforts. Availability of statistics describing the prevalence of carbon-processing functions across microbial populations will help fill existing knowledge gaps about their diversity, distribution, and metabolism. This data is particularly timely as carbon emissions from repurposed tropical peatlands continue to accelerate at an unprecedented rate, posing a grave threat to our climate.

Methods
Sample collection. Peat samples proximal to (0.5-1 m) and distant from (≥5 m) oil palm trees were collected as part of a time-stamped fertilizer intervention experiment in an oil palm plantation located in Jambi, Indonesia (103°49ʹ 32.23ʺ E, 1°40ʹ58.24ʺ S). The plantation was considered young as age of drainage was ≤10 years 8 . The sampling location, local weather conditions and peat physiochemical parameters have been previously described 8 . The mineral fertilizer intervention involved a single application of NPK (16:16:16; P as P 2 O 5 , and K as K 2 O; 1.6-1.8 kg palm −1 ) and urea (0.5-1 kg palm −1 ) following local practices 20,21 . Peat samples were collected from four oil palm trees across two time-points before (days 1 -2015-01-14 -and 4) and four time-points after (days 6, 7, 10, and 14) fertilizer application. All peat samples were collected from a depth 0-20 cm using an auger and flash frozen in liquid nitrogen on site.
Metagenome sequencing and assembly. Genomic DNA was extracted from all samples using the Zymo Research Soil MidiPrep kit (Zymo Research, CA, USA). Shotgun DNA libraries were prepared from a total of 36 samples using the TruSeq DNA library preparation kit (Illumina, San Diego, CA, USA) with 2 × 250 bp chemistry, and sequenced on the Illumina HiSeq 2500 (Illumina, San Diego, CA, USA) at SCELSE (https://www.scelse.sg), Nanyang Technological University, Singapore. We generated a total of 133.7 Gbp of raw sequence data, with each sample, containing on an average, 3.8 Gbp.
Functional annotation. Carbon-processing potential of the MAGs was estimated using METABOLIC v4.0 49 which integrates functional annotations from KEGG 16 , dbCAN2 17 , PFAM 18 , TIGRFAM 19 , and custom HMMs for specific metabolic functions. Metabolic pathways were considered present if the MAG contained at least one associated marker gene or absent otherwise. Presence/absence of carbon-processing pathways in MAGs is available on figshare 27 in the file "jopf_carbon_processing_pathways.csv".

Data Records
Raw metagenomes and metagenome-assembled genomes are available on NCBI BioProject PRJNA883528 50 . Datasets and data products generated from the raw data are available on figshare 27 .

Technical Validation
MAGs reported in this study only consist of those that met the medium quality threshold or above as defined in Bowers et al. 51 .

Usage Notes
Users/researchers should independently assess the accuracy of genes, contigs, and functional assignments for genomes of interest prior to downstream analysis.

code availability
Open-source software packages were used to process data and generate data products. Software versions and nondefault parameters are specified where required.